Multiple merge candidates for affine motion prediction

ABSTRACT

Different implementations are described, particularly implementations for selecting a predictor candidate from a set of multiple predictor candidates for motion compensation of a picture block based on a motion model. The motion model, may be, e.g., an affine model in a merge mode for a video content encoder or decoder. In an embodiment, a predictor candidate is selected from the set based on a motion model for each of the multiple predictor candidates, and may be based on a criterion such as, e.g., a rate distortion cost. The corresponding motion field is determined based on, e.g., one or more corresponding control point motion vectors for the block being encoded or decoded. The corresponding motion field of an embodiment to identifies motion vectors used for prediction of sub-blocks of the block being encoded or decoded.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. application Ser. No.17/557,179, filed Dec. 21, 2021, which is a continuation of U.S.application Ser. No. 16/622,895, filed Dec. 13, 2019 (U.S. Pat. No.11,245,921) which is a National Phase entry under U.S.C. § 371 ofInternational Application No. PCT/EP2018/066975, filed Jun. 25, 2018,which claims the benefit of European Patent Application No. 18305386.7,filed Mar. 30, 2018 and European Patent Application No. 17305797.7,filed Jun. 26, 2017, each of which is incorporated by reference hereinin its entirety.

TECHNICAL FIELD

At least one of the present embodiments generally relates to, e.g., amethod or an apparatus for video encoding or decoding, and moreparticularly, to a method or an apparatus for selecting a predictorcandidate from a set of multiple predictor candidates for motioncompensation based on a motion model such as, e.g., an affine model, fora video encoder or a video decoder.

BACKGROUND

To achieve high compression efficiency, image and video coding schemesusually employ prediction, including motion vector prediction, andtransform to leverage spatial and temporal redundancy in the videocontent. Generally, intra or inter prediction is used to exploit theintra or inter frame correlation, then the differences between theoriginal image and the predicted image, often denoted as predictionerrors or prediction residuals, are transformed, quantized, and entropycoded. To reconstruct the video, the compressed data are decoded byinverse processes corresponding to the entropy coding, quantization,transform, and prediction.

A recent addition to high compression technology includes using a motionmodel based on affine modeling. In particular, affine modeling is usedfor motion compensation for encoding and decoding of video pictures. Ingeneral, affine modeling is a model using at least two parameters suchas, e.g., two control point motion vectors (CPMVs) representing themotion at the respective corners of a block of picture, that allowsderiving a motion field for the whole block of a picture to simulate,e.g., rotation and homothety (zoom).

SUMMARY

According to a general aspect of at least one embodiment, a method forvideo encoding is presented, comprising: determining, for a block beingencoded in a picture, a set of predictor candidates having multiplepredictor candidates; selecting a predictor candidate from the set ofpredictor candidates; determining for the selected predictor candidatefrom the set of predictor candidates, one or more corresponding controlpoint motion vectors for the block; determining for the selectedpredictor candidate, based on the one or more corresponding controlpoint motion vectors, a corresponding motion field based on a motionmodel for the selected predictor candidate, wherein the correspondingmotion field identifies motion vectors used for prediction of sub-blocksof the block being encoded; encoding the block based on thecorresponding motion field for the selected predictor candidate from theset of predictor candidates; and encoding an index for the selectedpredictor candidate from the set of predictor candidates.

According to another general aspect of at least one embodiment, a methodfor video decoding is presented, comprising: receiving, for a blockbeing decoded in a picture, an index corresponding to a particularpredictor candidate; determining, for the particular predictorcandidate, one or more corresponding control point motion vectors forthe block being decoded; determining for the particular predictorcandidate, based on the one or more corresponding control point motionvectors, a corresponding motion field based on a motion model, whereinthe corresponding motion field identifies motion vectors used forprediction of sub-blocks of the block being decoded; and decoding theblock based on the corresponding motion field.

According to another general aspect of at least one embodiment, anapparatus for video encoding is presented, comprising: means fordetermining, for a block being encoded in a picture, a set of predictorcandidates having multiple predictor candidates; means for selecting apredictor candidate from the set of predictor candidates; means fordetermining for the selected predictor candidate, based on the one ormore corresponding control point motion vectors, a corresponding motionfield based on a motion model for the selected predictor candidate,wherein the corresponding motion field identifies motion vectors usedfor prediction of sub-blocks of the block being encoded; means forencoding the block based on the corresponding motion field for theselected predictor candidate from the set of predictor candidates; andmeans for encoding an index for the selected predictor candidate fromthe set of predictor candidates.

According to another general aspect of at least one embodiment, anapparatus for video decoding is presented, comprising: means forreceiving, for a block being decoded in a picture, an indexcorresponding to a particular predictor candidate; means fordetermining, for the particular predictor candidate, one or morecorresponding control point motion vectors for the block being decoded;means for determining for the particular predictor candidate, based onthe one or more corresponding control point motion vectors, acorresponding motion field based on a motion model, wherein thecorresponding motion field identifies motion vectors used for predictionof sub-blocks of the block being decoded; and means for decoding theblock based on the corresponding motion field.

According to another general aspect of at least one embodiment, anapparatus for video encoding is provided, comprising: one or moreprocessors, and at least one memory. Wherein the one or more processorsis configured to: determine, for a block being encoded in a picture, aset of predictor candidates having multiple predictor candidates; selecta predictor candidate from the set of predictor candidates; determinefor the selected predictor candidate from the set of predictorcandidates, one or more corresponding control point motion vectors forthe block; determine for the selected predictor candidate, based on theone or more corresponding control point motion vectors, a correspondingmotion field based on a motion model for the selected predictorcandidate, wherein the corresponding motion field identifies motionvectors used for prediction of sub-blocks of the block being encoded;encode the block based on the corresponding motion field for theselected predictor candidate from the set of predictor candidates; andencode an index for the selected predictor candidate from the set ofpredictor candidates. The at least one memory is for storing, at leasttemporarily, the encoded block and/or the encoded index.

According to another general aspect of at least one embodiment, anapparatus for video decoding is provided, comprising: one or moreprocessors and at least one memory. Wherein the one or more processorsis configured to: receive, for a block being decoded in a picture, anindex corresponding to a particular predictor candidate; determine, forthe particular predictor candidate, one or more corresponding controlpoint motion vectors for the block being decoded; determine for theparticular predictor candidate, based on the one or more correspondingcontrol point motion vectors, a corresponding motion field based on amotion model, wherein the corresponding motion field identifies motionvectors used for prediction of sub-blocks of the block being decoded;and decode the block based on the corresponding motion field. The atleast one memory is for storing, at least temporarily, the decodedblock.

According to a general aspect of at least one embodiment, a method forvideo encoding is presented, comprising: determining, for a block beingencoded in a picture, a set of predictor candidates; determining foreach of multiple predictor candidates from the set of predictorcandidates, one or more corresponding control point motion vectors forthe block; determining for each of the multiple predictor candidates,based on the one or more corresponding control point motion vectors, acorresponding motion field based on a motion model for the each of themultiple predictor candidates from the set of predictor candidates;evaluating the multiple predictor candidates according to one or morecriteria and based on the corresponding motion field; selecting apredictor candidate from the multiple predictor candidates based on theevaluating; and encoding the block based on the selected predictorcandidate from the set of predictor candidates.

According to a general aspect of at least one embodiment, a method forvideo decoding is presented, comprising retrieving, for a block beingdecoded in a picture, an index corresponding to a selected predictorcandidate. Wherein the selected predictor candidate is selected at anencoder by: determining, for a block being encoded in a picture, a setof predictor candidates; determining for each of multiple predictorcandidates from the set of predictor candidates, one or morecorresponding control point motion vectors for the block being encoded;determining for each of the multiple predictor candidates, based on theone or more corresponding control point motion vectors, a correspondingmotion field based on a motion model for the each of the multiplepredictor candidates from the set of predictor candidates; evaluatingthe multiple predictor candidates according to one or more criteria andbased on the corresponding motion field; selecting a predictor candidatefrom the multiple predictor candidates based on the evaluating; andencoding an index for the selected predictor candidate from the set ofpredictor candidates. The method further comprises decoding the blockbased on the index corresponding to the selected predictor candidate.

According to another general aspect of at least one embodiment, a methodmay further comprise: evaluating the multiple predictor candidatesaccording to one or more criteria and based on corresponding motionfields for each of the multiple predictor candidates; and selecting thepredictor candidate from the multiple predictor candidates based on theevaluating.

According to another general aspect of at least one embodiment, anapparatus may further comprise: means for evaluating the multiplepredictor candidates according to one or more criteria and based oncorresponding motion fields for each of the multiple predictorcandidates; and means for selecting the predictor candidate from themultiple predictor candidates based on the evaluating.

According to another general aspect of at least one embodiment, the oneor more criteria is based on a rate distortion determinationcorresponding to one or more of the multiple predictor candidates fromthe set of predictor candidates.

According to another general aspect of at least one embodiment, decodingor encoding the block based on the corresponding motion field comprisesdecoding or encoding, respectively, based on predictors for thesub-blocks, the predictors being indicated by the motion vectors.

According to another general aspect of at least one embodiment, the setof predictor candidates comprises spatial candidates, and/or temporalcandidates, of the block being encoded or decoded.

According to another general aspect of at least one embodiment, themotion model is an affine model.

According to another general aspect of at least one embodiment, thecorresponding motion field for each position (x, y) inside the blockbeing encoded or decoded is determined by:

$\left\{ \begin{matrix}{v_{x} = {{\frac{\left( {v_{1x} - v_{0x}} \right)}{w}x} - {\frac{\left( {v_{1y} - v_{0y}} \right)}{w}y} + v_{0x}}} \\{v_{y} = {{\frac{\left( {v_{1y} - v_{0y}} \right)}{w}x} + {\frac{\left( {v_{1x} - v_{0x}} \right)}{w}y} + v_{0y}}}\end{matrix} \right.$

wherein (v_(0x), v_(0y)) and (v_(1x), v_(1y)) are the control pointmotion vectors used to generate the corresponding motion field, (v_(0x),v_(0y)) corresponds to the control point motion vector of the top-leftcorner of the block being encoded or decoded, (v_(1x), v_(1y))corresponds to the control point motion vector of the top-right cornerof the block being encoded or decoded and w is the width of the blockbeing encoded or decoded.

According to another general aspect of at least one embodiment, thenumber of the spatial candidates is at least 5.

According to another general aspect of at least one embodiment, one ormore additional control point motion vectors are added for thedetermining of the corresponding motion field based on a function of thedetermined one or more corresponding control point motion vectors.

According to another general aspect of at least one embodiment, thefunction includes one or more of: 1) mean, 2) weighted mean, 3) uniquemean, 4) average, 5) median, or 6) uni-directional part of one of the 1)to 6) above, of the determined one or more corresponding control pointmotion vectors.

According to another general aspect of at least one embodiment, anon-transitory computer readable medium is presented containing datacontent generated according to the method or the apparatus of any of thepreceding descriptions.

According to another general aspect of at least one embodiment, a signalis provided comprising video data generated according to the method orthe apparatus of any of the preceding descriptions.

One or more of the present embodiments also provide a computer readablestorage medium having stored thereon instructions for encoding ordecoding video data according to any of the methods described above. Thepresent embodiments also provide a computer readable storage mediumhaving stored thereon a bitstream generated according to the methodsdescribed above. The present embodiments also provide a method andapparatus for transmitting the bitstream generated according to themethods described above. The present embodiments also provide a computerprogram product including instructions for performing any of the methodsdescribed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a block diagram of an embodiment of an HEVC (HighEfficiency Video Coding) video encoder.

FIG. 2A is a pictorial example depicting HEVC reference samplegeneration.

FIG. 2B is a pictorial example depicting intra prediction directions inHEVC.

FIG. 3 illustrates a block diagram of an embodiment of an HEVC videodecoder.

FIG. 4 illustrates an example of Coding Tree Unit (CTU) and Coding Tree(CT) concepts to represent a compressed HEVC picture.

FIG. 5 illustrates an example of divisions of a Coding Tree Unit (CTU)into Coding Units (CUs), Prediction Units (PUs), and Transform Units(TUs).

FIG. 6 illustrates an example of an affine model as the motion modelused in Joint Exploration Model (JEM).

FIG. 7 illustrates an example of 4×4 sub-CU based affine motion vectorfield used in Joint Exploration Model (JEM).

FIG. 8A illustrates an examples of motion vector prediction candidatesfor Affine Inter CUs.

FIG. 8B illustrates an example of motion vector prediction candidates inthe Affine Merge mode.

FIG. 9 illustrates an example of spatial derivation of affine controlpoint motion vectors in the case of Affine Merge mode motion model.

FIG. 10 illustrates an example method according to a general aspect ofat least one embodiment.

FIG. 11 illustrates another example method according to a general aspectof at least one embodiment.

FIG. 12 also illustrates another example method according to a generalaspect of at least one embodiment.

FIG. 13 also illustrates another example method according to a generalaspect of at least one embodiment.

FIG. 14 illustrate an example of a known process for evaluating theAffine Merge mode of an inter-CU in JEM.

FIG. 15 illustrates an example of a process for choosing the predictorcandidate in the Affine Merge mode in JEM.

FIG. 16 illustrates an example of propagated affine motion fieldsthrough an Affine Merge predictor candidate located to the left of thecurrent block being encoded or decoded.

FIG. 17 illustrates an example of propagated affine motion fieldsthrough an Affine Merge predictor candidate located to the top and rightof the current block being encoded or decoded.

FIG. 18 illustrates an example of a predictor candidate selectionprocess according to a general aspect of at least one embodiment.

FIG. 19 illustrates an example of a process to build a set of multiplepredictor candidates according to a general aspect of at least oneembodiment.

FIG. 20 illustrates an example of a derivation process of top-left andtop-right corner CPMVs for each predictor candidate according to ageneral aspect of at least one embodiment.

FIG. 21 illustrates an example of an augmented set of spatial predictorcandidates according to a general aspect of at least one embodiment.

FIG. 22 illustrates another example of a process to build a set ofmultiple predictor candidates according to a general aspect of at leastone embodiment.

FIG. 23 also illustrates another example of a process to build a set ofmultiple predictor candidates according to a general aspect of at leastone embodiment.

FIG. 24 illustrates an example of how temporal candidates may be usedfor predictor candidates according to a general aspect of at least oneembodiment.

FIG. 25 illustrates an example of a process of adding the mean CPMVmotion vectors computed of the stored CPMV candidate to the final CPMVcandidate set according to a general aspect of at least one embodiment.

FIG. 26 illustrates a block diagram of an example apparatus in whichvarious aspects of the embodiments may be implemented.

DETAILED DESCRIPTION

FIG. 1 illustrates an exemplary High Efficiency Video Coding (HEVC)encoder 100. HEVC is a compression standard developed by JointCollaborative Team on Video Coding (JCT-VC) (see, e.g., “ITU-T H.265TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (10/2014), SERIES H:AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Infrastructure of audiovisualservices—Coding of moving video, High efficiency video coding,Recommendation ITU-T H.265”).

In HEVC, to encode a video sequence with one or more pictures, a pictureis partitioned into one or more slices where each slice can include oneor more slice segments. A slice segment is organized into coding units,prediction units, and transform units.

In the present application, the terms “reconstructed” and “decoded” maybe used interchangeably, the terms “encoded” or “coded” may be usedinterchangeable, and the terms “picture” and “frame” may be usedinterchangeably. Usually, but not necessarily, the term “reconstructed”is used at the encoder side while “decoded” is used at the decoder side.

The HEVC specification distinguishes between “blocks” and “units,” wherea “block” addresses a specific area in a sample array (e.g., luma, Y),and the “unit” includes the collocated blocks of all encoded colorcomponents (Y, Cb, Cr, or monochrome), syntax elements, and predictiondata that are associated with the blocks (e.g., motion vectors).

For coding, a picture is partitioned into coding tree blocks (CTB) ofsquare shape with a configurable size, and a consecutive set of codingtree blocks is grouped into a slice. A Coding Tree Unit (CTU) containsthe CTBs of the encoded color components. A CTB is the root of aquadtree partitioning into Coding Blocks (CB), and a Coding Block may bepartitioned into one or more Prediction Blocks (PB) and forms the rootof a quadtree partitioning into Transform Blocks (TBs). Corresponding tothe Coding Block, Prediction Block, and Transform Block, a Coding Unit(CU) includes the Prediction Units (PUs) and the tree-structured set ofTransform Units (TUs), a PU includes the prediction information for allcolor components, and a TU includes residual coding syntax structure foreach color component. The size of a CB, PB, and TB of the luma componentapplies to the corresponding CU, PU, and TU. In the present application,the term “block” can be used to refer, for example, to any of CTU, CU,PU, TU, CB, PB, and TB. In addition, the “block” can also be used torefer to a macroblock and a partition as specified in H.264/AVC or othervideo coding standards, and more generally to refer to an array of dataof various sizes.

In the exemplary encoder 100, a picture is encoded by the encoderelements as described below. The picture to be encoded is processed inunits of CUs. Each CU is encoded using either an intra or inter mode.When a CU is encoded in an intra mode, it performs intra prediction(160). In an inter mode, motion estimation (175) and compensation (170)are performed. The encoder decides (105) which one of the intra mode orinter mode to use for encoding the CU, and indicates the intra/interdecision by a prediction mode flag. Prediction residuals are calculatedby subtracting (110) the predicted block from the original image block.

CUs in intra mode are predicted from reconstructed neighboring sampleswithin the same slice. A set of 35 intra prediction modes is availablein HEVC, including a DC, a planar, and 33 angular prediction modes. Theintra prediction reference is reconstructed from the row and columnadjacent to the current block. The reference extends over two times theblock size in the horizontal and vertical directions using availablesamples from previously reconstructed blocks. When an angular predictionmode is used for intra prediction, reference samples can be copied alongthe direction indicated by the angular prediction mode.

The applicable luma intra prediction mode for the current block can becoded using two different options. If the applicable mode is included ina constructed list of three most probable modes (MPM), the mode issignaled by an index in the MPM list. Otherwise, the mode is signaled bya fixed-length binarization of the mode index. The three most probablemodes are derived from the intra prediction modes of the top and leftneighboring blocks.

For an inter CU, the corresponding coding block is further partitionedinto one or more prediction blocks. Inter prediction is performed on thePB level, and the corresponding PU contains the information about howinter prediction is performed. The motion information (i.e., motionvector and reference picture index) can be signaled in two methods,namely, “merge mode” and “advanced motion vector prediction (AMVP)”.

In the merge mode, a video encoder or decoder assembles a candidate listbased on already coded blocks, and the video encoder signals an indexfor one of the candidates in the candidate list. At the decoder side,the motion vector (MV) and the reference picture index are reconstructedbased on the signaled candidate.

The set of possible candidates in the merge mode consists of spatialneighbor candidates, a temporal candidate, and generated candidates.FIG. 2A shows the positions of five spatial candidates {a₁, b₁, b₀, a₀,b₂} for a current block 210, wherein a₀ and a₁ are to the left of thecurrent block, and b₁, b₀, b₂ are at the top of the current block. Foreach candidate position, the availability is checked according to theorder of a₁, b₁, b₀, a₀, b₂, and then the redundancy in candidates isremoved.

The motion vector of the collocated location in a reference picture canbe used for derivation of a temporal candidate. The applicable referencepicture is selected on a slice basis and indicated in the slice header,and the reference index for the temporal candidate is set to i_(ref)=0.If the POC distance (td) between the picture of the collocated PU andthe reference picture from which the collocated PU is predicted from, isthe same as the distance (tb) between the current picture and thereference picture containing the collocated PU, the collocated motionvector mv_(col) can be directly used as the temporal candidate.Otherwise, a scaled motion vector, tb/td*mv_(col), is used as thetemporal candidate. Depending on where the current PU is located, thecollocated PU is determined by the sample location at the bottom-rightor at the center of the current PU.

The maximum number of merge candidates, N, is specified in the sliceheader. If the number of merge candidates is larger than N, only thefirst N−1 spatial candidates and the temporal candidate are used.Otherwise, if the number of merge candidates is less than N, the set ofcandidates is filled up to the maximum number N with generatedcandidates as combinations of already present candidates, or nullcandidates. The candidates used in the merge mode may be referred to as“merge candidates” in the present application.

If a CU indicates a skip mode, the applicable index for the mergecandidate is indicated only if the list of merge candidates is largerthan 1, and no further information is coded for the CU. In the skipmode, the motion vector is applied without a residual update.

In AMVP, a video encoder or decoder assembles candidate lists based onmotion vectors determined from already coded blocks. The video encoderthen signals an index in the candidate list to identify a motion vectorpredictor (MVP) and signals a motion vector difference (MVD). At thedecoder side, the motion vector (MV) is reconstructed as MVP+MVD. Theapplicable reference picture index is also explicitly coded in the PUsyntax for AMVP.

Only two spatial motion candidates are chosen in AMVP. The first spatialmotion candidate is chosen from left positions {a₀, a₁} and the secondone from the above positions {b₀, b₁, b₂}, while keeping the searchingorder as indicated in the two sets. If the number of motion vectorcandidates is not equal to two, the temporal MV candidate can beincluded. If the set of candidates is still not fully filled, then zeromotion vectors are used.

If the reference picture index of a spatial candidate corresponds to thereference picture index for the current PU (i.e., using the samereference picture index or both using long-term reference pictures,independently of the reference picture list), the spatial candidatemotion vector is used directly. Otherwise, if both reference picturesare short-term ones, the candidate motion vector is scaled according tothe distance (tb) between the current picture and the reference pictureof the current PU and the distance (td) between the current picture andthe reference picture of the spatial candidate. The candidates used inthe AMVP mode may be referred to as “AMVP candidates” in the presentapplication.

For ease of notation, a block tested with the “merge” mode at theencoder side or a block decoded with the “merge” mode at the decoderside is denoted as a “merge” block, and a block tested with the AMVPmode at the encoder side or a block decoded with the AMVP mode at thedecoder side is denoted as an “AMVP” block.

FIG. 2B illustrates an exemplary motion vector representation usingAMVP. For a current block 240 to be encoded, a motion vector(MV_(current)) can be obtained through motion estimation. Using themotion vector (MV_(left)) from a left block 230 and the motion vector(MV_(above)) from the above block 220, a motion vector predictor can bechosen from MV_(left) and MV_(above) as MVP_(current). A motion vectordifference then can be calculated asMVD_(current)=MV_(current)−MVP_(current).

Motion compensation prediction can be performed using one or tworeference pictures for prediction. In P slices, only a single predictionreference can be used for Inter prediction, enabling uni-prediction fora prediction block. In B slices, two reference picture lists areavailable, and uni-prediction or bi-prediction can be used. Inbi-prediction, one reference picture from each of the reference picturelists is used.

In HEVC, the precision of the motion information for motion compensationis one quarter-sample (also referred to as quarter-pel or 1/4-pel) forthe luma component and one eighth-sample (also referred to as 1/8-pel)for the chroma components for the 4:2:0 configuration. A 7-tap or 8-tapinterpolation filter is used for interpolation of fractional-samplepositions, i.e., 1/4, 1/2 and 3/4 of full sample locations in bothhorizontal and vertical directions can be addressed for luma.

The prediction residuals are then transformed (125) and quantized (130).The quantized transform coefficients, as well as motion vectors andother syntax elements, are entropy coded (145) to output a bitstream.The encoder may also skip the transform and apply quantization directlyto the non-transformed residual signal on a 4×4 TU basis. The encodermay also bypass both transform and quantization, i.e., the residual iscoded directly without the application of the transform or quantizationprocess. In direct PCM coding, no prediction is applied and the codingunit samples are directly coded into the bitstream.

The encoder decodes an encoded block to provide a reference for furtherpredictions. The quantized transform coefficients are de-quantized (140)and inverse transformed (150) to decode prediction residuals. Combining(155) the decoded prediction residuals and the predicted block, an imageblock is reconstructed. In-loop filters (165) are applied to thereconstructed picture, for example, to perform deblocking/SAO (SampleAdaptive Offset) filtering to reduce encoding artifacts. The filteredimage is stored at a reference picture buffer (180).

FIG. 3 illustrates a block diagram of an exemplary HEVC video decoder300. In the exemplary decoder 300, a bitstream is decoded by the decoderelements as described below. Video decoder 300 generally performs adecoding pass reciprocal to the encoding pass as described in FIG. 1 ,which performs video decoding as part of encoding video data.

In particular, the input of the decoder includes a video bitstream,which may be generated by video encoder 100. The bitstream is firstentropy decoded (330) to obtain transform coefficients, motion vectors,and other coded information. The transform coefficients are de-quantized(340) and inverse transformed (350) to decode the prediction residuals.Combining (355) the decoded prediction residuals and the predictedblock, an image block is reconstructed. The predicted block may beobtained (370) from intra prediction (360) or motion-compensatedprediction (i.e., inter prediction) (375). As described above, AMVP andmerge mode techniques may be used to derive motion vectors for motioncompensation, which may use interpolation filters to calculateinterpolated values for sub-integer samples of a reference block.In-loop filters (365) are applied to the reconstructed image. Thefiltered image is stored at a reference picture buffer (380).

As mentioned, in HEVC, motion compensated temporal prediction isemployed to exploit the redundancy that exists between successivepictures of a video. To do that, a motion vector is associated with eachprediction unit (PU). As explained above, each CTU is represented by aCoding Tree in the compressed domain. This is a quad-tree division ofthe CTU, where each leaf is called a Coding Unit (CU) and is alsoillustrated in FIG. 4 for CTUs 410 and 420. Each CU is then given someIntra or Inter prediction parameters as prediction information. To doso, a CU may be spatially partitioned into one or more Prediction Units(PUs), each PU being assigned some prediction information. The Intra orInter coding mode is assigned on the CU level. These concepts arefurther illustrated in FIG. 5 for an exemplary CTU 500 and a CU 510.

In HEVC, one motion vector is assigned to each PU. This motion vector isused for motion compensated temporal prediction of the considered PU.Therefore, in HEVC, the motion model that links a predicted block andits reference block simply consists of a translation or calculationbased on the reference block and the corresponding motion vector.

To make improvements to HEVC, the reference software and/ordocumentation JEM (Joint Exploration Model) is being developed by theJoint Video Exploration Team (JVET). In one JEM version (e.g.,“Algorithm Description of Joint Exploration Test Model 5”, DocumentJVET-E1001_v2, Joint Video Exploration Team of ISO/IEC JTC1/SC29/WG11,meeting, 12-20 Jan. 2017, Geneva, CH), some further motion models aresupported to improve temporal prediction. To do so, a PU can bespatially divided into sub-PUs and a model can be used to assign eachsub-PU a dedicated motion vector.

In more recent versions of the JEM (e.g., “Algorithm Description ofJoint Exploration Test Model 2”, Document JVET-B1001_v3, Joint VideoExploration Team of ISO/IEC JTC1/SC29/WG11, 2nd meeting, 20-26 Feb.2016, San Diego, USA″), a CU is no longer specified to be divided intoPUs or TUs. Instead, more flexible CU sizes may be used, and some motiondata are directly assigned to each CU. In this new codec design underthe newer versions of JEM, a CU may be divided into sub-CUs and a motionvector may be computed for each sub-CU of the divided CU.

One of the new motion models introduced in the JEM is the use of anaffine model as the motion model to represent the motion vectors in aCU. The motion model used is illustrated by FIG. 6 and is represented byEquation 1 as shown below. The affine motion field comprises thefollowing motion vector component values for each position (x, y) insidethe considered block 600 of FIG. 6 :

$\left\{ \begin{matrix}{v_{x} = {{\frac{\left( {v_{1x} - v_{0x}} \right)}{w}x} - {\frac{\left( {v_{1y} - v_{0y}} \right)}{w}y} + v_{0x}}} \\{v_{y} = {{\frac{\left( {v_{1y} - v_{0y}} \right)}{w}x} + {\frac{\left( {v_{1x} - v_{0x}} \right)}{w}y} + v_{0y}}}\end{matrix} \right.$

Equation 1: Affine Motion Model Used to Generate the Motion Field Insidea CU for Prediction

wherein (v_(0x), v_(0y)) and (v_(1x), v_(1y)) are the control pointmotion vectors used to generate the corresponding motion field, (v_(0x),v_(0y)) corresponds to the control point motion vector of the top-leftcorner of the block being encoded or decoded, (v_(1x), v_(1y))corresponds to the control point motion vector of the top-right cornerof the block being encoded or decoded, and w is the width of the blockbeing encoded or decoded.

To reduce complexity, a motion vector is computed for each 4×4 sub-block(sub-CU) of the considered CU 700, as illustrated in FIG. 7 . An affinemotion vector is computed from the control point motion vectors, foreach center position of each sub-block. The obtained MV is representedat 1/16 pel accuracy. As a result, the compensation of a coding unit inthe affine mode consists in motion compensated prediction of eachsub-block with its own motion vector. These motion vectors for thesub-blocks are shown respectively as an arrow for each of the sub-blocksin FIG. 7 .

Since, in the JEM, the seeds are saved within the corresponding 4×4sub-blocks, the affine mode can only be used for CU with a width and aheight larger to 4 (to have independent sub-blocks for each seed). Forexample, in a 64×4 CU there is only one left sub-block to save thetop-left and the bottom-left seeds, and in a 4×32 CU there is only onetop sub-block for the top-left and top-right seeds; in the JEM, it isnot possible to correctly save the seeds in such thin CU. With ourproposal, since seeds are saved separately, we are able to process suchthin CU with a width or a height equal to 4.

Referring back to the example of FIG. 7 , an affine CU is defined by itsassociated affine model composed of three motion vectors, called theaffine model seeds, as the motion vectors from the top-left, top-rightand bottom-left corners of the CU (v0, v1 and v2 in FIG. 7 ). Thisaffine model then allows calculating the affine motion vector fieldwithin the CU which is performed on a 4×4 sub-block basis (black motionvectors on FIG. 7 ). In the JEM, these seeds are attached to thetop-left, top-right and bottom-left 4×4 sub-blocks in the considered CU.In the proposed solution, the affine model seeds are stored separatelyas a motion information associated to the whole CU (like the IC flag,for example). The motion model is thus decoupled from the motion vectorsused for actual motion compensation at the 4×4 block level. This newstorage may allow saving the complete motion vector field at the 4×4sub-block level. It also allows using affine motion compensation forblock of size 4 in width or height.

Affine motion compensation may be used in 2 ways in the JEM: AffineInter (AF_INTER) mode and Affine Merge mode. They are introduced in thefollowing sections.

Affine Inter (AF_INTER) mode: A CU in AMVP mode, whose size is largerthan 8×8, may be predicted in Affine Inter mode. This is signaledthrough a flag in the bit-stream. The generation of the Affine MotionField for that inter CU includes determining control point motionvectors (CPMVs), which are obtained by the decoder through the additionof a motion vector differential and a control point motion vectorprediction (CPMVP). The CPMVPs are a pair of motion vector candidates,respectively taken from the set (A, B, C) and (D, E) illustrated in FIG.8A for a current CU 800 being encoded or decoded.

Affine Merge mode: In Affine Merge mode, a CU-level flag indicates if amerge CU employs affine motion compensation. If so, then the firstavailable neighboring CU that has been coded in an Affine mode isselected among the ordered set of candidate positions A, B, to C, D, Eof FIG. 8B for a current CU 880 being encoded or decoded. Note that thisordered set of candidate positions in JEM is the same as the spatialneighbor candidates in the merge mode in HEVC as shown in FIG. 2A and asexplained previously.

Once the first neighboring CU in Affine mode is obtained, then the 3CPMVs {right arrow over (ν₂)}, {right arrow over (ν₃)} and {right arrowover (ν₄)} from the top-left, top-right and bottom-left corners of theneighboring affine CU are retrieved or calculated. For example, FIG. 9shows that this first determined neighboring CU 910 in Affine mode beingin the A position of FIG. 8B for a current CU 900 being encoded ordecoded. Based on these three CPMVs of the neighboring CU 910, the twoCPMVs of the top-left and top-right corners of the current CU 900 arederived as follows:

$\overset{\rightarrow}{v_{0}} = {\overset{\rightarrow}{v_{2}} + {\left( {\overset{\rightarrow}{v_{4}} - \overset{\rightarrow}{v_{2}}} \right)\left( \frac{Y_{curr} - Y_{neighb}}{H_{neighb}} \right)} + {\left( {\overset{\rightarrow}{v_{3}} - \overset{\rightarrow}{v_{2}}} \right)\left( \frac{X_{curr} - X_{neighb}}{W_{neighb}} \right)}}$$\overset{\rightarrow}{v_{1}} = {\overset{\rightarrow}{v_{0}} + {\left( {\overset{\rightarrow}{v_{3}} - \overset{\rightarrow}{v_{2}}} \right)\left( \frac{W_{curr}}{W_{neighb}} \right)}}$

Equation 2: Derivation of CPMVs of the Current CU Based on the ThreeControl-Point Motion Vectors of the Selected Neighboring CU

When the control point motion vectors {right arrow over (ν₀)} and {rightarrow over (ν₁)} of the current CU are obtained, the motion field insidethe current CU being encoded or decoded is computed on a 4×4 sub-CUbasis, through the model of Equation 1 as described above in connectionwith FIG. 6 .

Accordingly, a general aspect of at least one embodiment aims to improvethe performance of the Affine Merge mode in JEM so that the compressionperformance of a considered video codec may be improved. Therefore, inat least one embodiment, an augmented and improved affine motioncompensation apparatus and method are presented, for example, for CodingUnits that are coded in Affine Merge mode. The proposed augmented andimproved affine mode includes evaluating multiple predictor candidatesin the Affine Merge mode.

As discussed before, in the current JEM, the first neighboring CU codedin Affine Merge mode among the surrounding CUs is selected to predictthe affine motion model associated with the current CU being encoded ordecoded. That is, the first neighboring CU candidate among the orderedset (A, B, C, D, E) of FIG. 8B that is coded in affine mode is selectedto predict the affine motion model of current CU.

Accordingly, at least one embodiment selects the Affine Merge predictioncandidate that provides the best coding efficiency when coding thecurrent CU in Affine Merge mode, instead of using just the first one inthe ordered set as noted above. The improvements of this embodiment, ata general level, therefore comprise, for example:

-   -   constructing a set of multiple Affine Merge predictor candidates        that is likely to provide a good set of candidates for the        prediction of an affine motion model of a CU (for        encoder/decoder);    -   selecting one predictor for the current CU's control point        motion vector among the constructed set (for encoder/decoder);        and/or    -   signaling/decoding the index of current CU's control point        motion vector predictor (for encoder/decoder).

Accordingly, FIG. 10 illustrates an exemplary encoding method 1000according to a general aspect of at least one embodiment. At 1010, themethod 1000 determines, for a block being encoded in a picture, a set ofpredictor candidates having multiple predictor candidates. At 1020, themethod 1000 selects a predictor candidate from the set of predictorcandidates. At 1030, the method 1000 determines for the selectedpredictor candidate from the set of predictor candidates, one or morecorresponding control point motion vectors for the block. At 1040, themethod 1000 determines for the selected predictor candidate, based onthe one or more corresponding control point motion vectors, acorresponding motion field based on a motion model for the selectedpredictor candidate, wherein the corresponding motion field identifiesmotion vectors used for prediction of sub-blocks of the block beingencoded. At 1050, the method 1000 encodes the block based on thecorresponding motion field for the selected predictor candidate from theset of predictor candidates. At 1060, the method 1000 encodes an indexfor the selected predictor candidate from the set of predictorcandidates.

FIG. 11 illustrates another exemplary encoding method 1100 according toa general aspect of at least one embodiment. At 1110, the method 1100determines, for a block being encoded in a picture, a set of predictorcandidates. At 1120, the method 1100 determines for each of multiplepredictor candidates from the set of predictor candidates, one or morecorresponding control point motion vectors for the block. At 1130, themethod 1100 determines for each of the multiple predictor candidates,based on the one or more corresponding control point motion vectors, acorresponding motion field based on a motion model for the each of themultiple predictor candidates from the set of predictor candidates. At1140, the method 1100 evaluates the multiple predictor candidatesaccording to one or more to criteria and based on the correspondingmotion field. At 1150, the method 1100 selects a predictor candidatefrom the multiple predictor candidates based on the evaluation. At 1160,the method 1100 encodes an index for the selected predictor candidatefrom the set of predictor candidates.

FIG. 12 illustrates an exemplary decoding method 1200 according to ageneral aspect of at least one embodiment. At 1210, the method 1200receives, for a block being decoded in a picture, an index correspondingto a particular predictor candidate. In various embodiments, theparticular predictor candidate has been selected at an encoder, and theindex allows one of multiple predictor candidates to be selected. At1220, the method 1200 determines, for the particular predictorcandidate, one or more corresponding control point motion vectors forthe block being decoded. At 1230, the method 1200 determines for theparticular predictor candidate, based on the one or more correspondingcontrol point motion vectors, a corresponding motion field. In variousembodiments, the motion field is based on a motion model, wherein thecorresponding motion field identifies motion vectors used for predictionof sub-blocks of the block being decoded. At 1240, the method 1200decodes the block based on the corresponding motion field.

FIG. 13 illustrates another exemplary decoding method 1300 according toa general aspect of at least one embodiment. At 1310, the method 1300retrieves, for a block being decoded in a picture, an indexcorresponding a selected predictor candidate. As also shown at 1310, theselected predictor candidate has been selected at an encoder by:determining, for a block being encoded in a picture, a set of predictorcandidates; determining for each of multiple predictor candidates fromthe set of predictor candidates, one or more corresponding control pointmotion vectors for the block being encoded; determining for each of themultiple predictor candidates, based on the one or more correspondingcontrol point motion vectors, a corresponding motion field based on amotion model for the each of the multiple predictor candidates from theset of predictor candidates; evaluating the multiple predictorcandidates according to one or more criteria and based on thecorresponding motion field; selecting a predictor candidate from themultiple predictor candidates based on the evaluating; and encoding anindex for the selected predictor candidate from the set of predictorcandidates. At 1320, the method 1300 decodes the block based on theindex corresponding to the selected predictor candidate.

FIG. 14 illustrates the detail of an embodiment of a process 1400 usedto predict the affine motion field of a current CU being encoded ordecoded in the existing Affine Merge mode in JEM. The input 1401 to thisprocess 1400 is the current Coding Unit for which one wants to generatethe affine motion field of the sub-blocks as shown in FIG. 7 . At 1410,the Affine Merge CPMVs for the current block are obtained with theselected predictor candidate as explained above in connection with,e.g., FIG. 6 , FIG. 7 , FIG. 8B, and FIG. 9 . The derivation of thispredictor candidate is also explained in more detail later with respectto FIG. 15 .

As a result, at 1420, the top-left and top-right control point motionvectors {right arrow over (ν₀)} and {right arrow over (ν₁)} are thenused to compute the affine motion field associated with the current CU.This consists in computing a motion vector for each 4×4 sub-blockaccording to Equation 1 as explained before. At 1430 and 1440, once themotion field is obtained for the current CU, the temporal prediction ofthe current CU takes place, involving 4×4 sub-block based motioncompensation and then OBMC (Overlapped Block Motion Compensation). At1450 and 1460, the current CU is coded and reconstructed, successivelywith and without residual data. A mode is selected based on the RDcompetition, and that mode is used to encode the current CU, and anindex for that mode is also encoded in various embodiments.

In at least one implementation, a residual flag is used. At 1450, a flagis activated (noResidual=0) indicating that the coding is done withresidual data. At 1460, the current CU is fully coded and reconstructed(with residual) giving the corresponding RD cost. Then the flag isdeactivated (1480, 1485, noResidual=1) indicating that the coding isdone without residual data, and the process goes back to 1460 where theCU is coded (without residual) giving the corresponding RD cost. Thelowest RD cost (1470, 1475) between the two previous ones indicates ifresidual must be coded or not (normal or skip). Method 1400 ends at1499. Then this best RD cost is put in competition with other codingmodes. Rate distortion determination will be explained in more detailbelow.

FIG. 15 shows the detail of an embodiment of a process 1500 used topredict the one or more control points of the current CU's affine motionfield. This consists in searching (1510, 1520, 1530, 1540, 1550) a CUthat has been coded/decoded in Affine mode, among the spatial positions(A, B, C, D, E) of FIG. 8B. If none of the searched spatial positions iscoded in Affine mode, then a variable indicating the number of candidateposition, for example, numValidMergeCand, is set (1560) to 0. Otherwise,the first position that corresponds to a CU in Affine mode is selected(1515, 1525, 1535, 1545, 1555). The process 1500 then consists incomputing control point motion vectors that will be used later togenerate the affine motion field assigned to the current CU and setting(1580) numValidMergeCand to 1. This control to point computationproceeds as follows. The CU that contains the selected position isdetermined. It is one of the neighbor CUs of current CU as explainedbefore. Next, the 3 CPMVs {right arrow over (ν₂)}, {right arrow over(ν₃)} and {right arrow over (ν₄)} from the top-left, top-right andbottom-left corners inside the selected neighbor CU are retrieved (ordetermined) as explained before in connection with FIG. 9 . Finally, thetop-left and top-right CPMVs {right arrow over (ν₀)}, and {right arrowover (ν₁)} of the current CU are derived (1570), according to Equation1, as explained before in connection with FIG. 6 . Method 1500 ends at1599.

The present inventors have recognized that one aspect of the existingAffine Merge process described above is that it systematically employsone and only one motion vector predictor to propagate an affine motionfield from a surrounding causal (i.e., already encoded or decoded) andneighboring CU towards a current CU. In various situations, the presentinventors have further recognized that this aspect can bedisadvantageous because, for example, it does not select the optimalmotion vector predictor. Moreover, the choice of this predictor consistsonly of the first causal and neighboring CU coded in Affine mode, in theordered set (A, B, C, D, E), as already noted before. In varioussituations, the present inventors have further recognized that thislimited choice can be disadvantageous because, for example, a betterpredictor might be available. Therefore, the existing process in thecurrent JEM does not consider the fact that several potential causal andneighboring CUs around the current CU may also have used affine motion,and that a different CU other than the first one found to have usedaffine motion may be a better predictor for the current CU's motioninformation.

Therefore, the present inventors have recognized the potentialadvantages in several ways to improve the prediction of the current CUaffine motion vectors that are not being exploited by the existing JEMcodecs. According to a general aspect of at least one embodiment, suchadvantages provided in the present motion models may be seen and areillustrated in FIG. 16 and FIG. 17 , as explained below.

In both FIG. 16 and FIG. 17 , the current CU being encoded or decoded isthe large one in the middle, respectively, 1610 in FIGS. 16 and 1710 inFIG. 17 . The two potential predictor candidates correspond to positionsA and C of FIG. 8B, and are shown respectively as predictor candidates1620 in FIGS. 16 and 1720 in FIG. 17 . In particular, FIG. 16illustrates the potential motion fields of the current block 1610 beingcoded or decoded if the selected predictor candidate is located on theleft position (position A of FIG. 8B). Likewise, FIG. 17 illustrates thepotential motion fields of the current block 1710 being coded or decodedif the selected predictor candidate is located on the right and topposition (i.e., position C of FIG. 8B). As shown in the illustrativefigures, depending on which Affine Merge predictor is chosen, differentsets of motion vectors for the sub-blocks may be generated for thecurrent CU. Therefore, the present inventors recognize that one or morecriteria such as e.g., a Rate Distortion (RD) optimized choice betweenthese two candidates may help in improving the coding/decodingperformance of the current CU in Affine Merge mode.

Therefore, one general aspect of at least one embodiment consists inselecting a better motion predictor candidate to derive the CPMV of acurrent CU being encoded or decoded, among a set of multiple candidates.On the encoder side, the candidate used to predict the current CPMV ischosen according to a rate distortion cost criteria, according to oneaspect of one exemplary embodiment. Its index is then coded in theoutput bit-stream for the decoder, according to another aspect ofanother exemplary embodiment.

According to another aspect of another exemplary embodiment, in thedecoder, the set of candidates may be built, and a predictor may beselected from the set, in the same way as on the encoder side. In suchan embodiment, no index needs to be coded in the output bit-stream.Another embodiment of the decoder avoids building the set of candidates,or at least avoids selecting a predictor from the set as in the encoder,and simply decodes the index corresponding to the selected candidatefrom the bit-stream to derive the corresponding relevant data.

According to another aspect of another exemplary embodiment, CPMVs usedherewith are not limited to the two at the top-right and top-leftpositions of the current CU being coded or decoded, as shown in FIG. 6 .Other embodiments comprise, e.g., only one vector or more than twovectors, and the positions of these CPMVs are e.g., at other cornerpositions, or at any positions in or out of the current block, as longas it is possible to derive a motion field such as, e.g., at theposition(s) of the center of the corner 4×4 sub-blocks, or the internalcorner of the corner 4×4 sub-blocks.

In an exemplary embodiment, the set of potential candidate predictorsbeing investigated is identical to the set of positions (A, B, C, D, E)used to retrieve the CPMV predictor in the existing Affine Merge mode inJEM, as illustrated in FIG. 8B. FIG. 18 illustrates the details of oneexemplary selection process 1800 for selecting the best candidate topredict a current CU's affine motion model according to a general aspectof this embodiment. However, other embodiments use a set of predictorpositions that is different from A, B, C, D, E, and that can includefewer or more elements in the set.

As shown at 1801, the input to this exemplary embodiment 1800 is alsoinformation of the current CU being encoded or decoded. At 1810, a setof multiple Affine Merge predictor candidates is built, according to thealgorithm 1500 of FIG. 15 , which was explained before. Algorithm 1500of FIG. 15 includes gathering all neighboring positions (A, B, C, D, E)shown in FIG. 8A that corresponds to a causal CU which has been coded inAffine mode, into a set of candidates for the prediction of current CUaffine motion. Thus, instead of stopping when a causal affine CU isfound, the process 1800 stores all possible candidates for the affinemotion model propagation from a causal CU to the current CU for all ofthe multiple motion predictor candidates in the set.

Once the process of FIG. 15 is done as shown at 1810 of FIG. 18 , theprocess 1800 of FIG. 18 , at 1820, computes the top-left and top-rightcorner CPMVs predicted from each candidate of the set provided at 1810.This process of 1820 is further detailed and illustrated by FIG. 19 .

Again, FIG. 19 shows the detail of 1820 in FIG. 18 and includes a loopover each candidate determined and found from the preceding step (1810of FIG. 18 ). For each Affine Merge predictor candidate, the CU thatcontains the spatial position of that candidate is determined. Then foreach reference list L0 and L1 (in the base of a B slice), the controlpoint motion vectors {right arrow over (ν₀)} and {right arrow over (ν₁)}useful to produce the current CU's motion field are derived according toEquation 2. These two CPMVs for each candidate are stored in the set ofcandidate CPMVs.

Once the process of FIG. 19 is done and the process is returned to FIG.18 , a loop 1830 over each Affine Merge predictor candidate isperformed. It may select, for example, the CPMV candidate that leads tothe lowest rate distortion cost. Inside the loop 1830 over eachcandidate, another loop 1840 which is similar to the process as shown onFIG. 14 is used to code the current CU with each CPMV candidate asexplained before. The algorithm of FIG. 14 ends when all candidates havebeen evaluated, and its output may comprise the index of the bestpredictor. As indicated before, as an example, the candidate with theminimum rate distortion cost may be selected as the best predictor.Various embodiments use the best predictor to encode the current CU, andcertain embodiments also encode an index for the best predictor.

One example of a determination of the rate distortion cost is defined asfollows, as is well known to a person skilled in the art:

RD _(cost) =D+λ×R

wherein D represents the distortion (typically an L2 distance) betweenthe original block and a reconstructed block obtained by encoding anddecoding the current CU with the considered candidate; R represents therate cost, e.g. the number of bits generated by coding the current blockwith the considered candidate; λ is the Lagrange parameter, whichrepresents the rate target at which the video sequence is being encoded.

Another exemplary embodiment is described below. This exemplaryembodiment aims at further improving the coding performance of theAffine Merge mode, by extending the set of Affine Merge candidatescompared to the existing JEM. This exemplary embodiment may be executedboth on the encoder and the decoder sides, in a similar manner, toextend the set of candidates. Accordingly, in one non-limiting aspect,some additional predictor candidates may be used to build the set of themultiple Affine Merge candidates. The additional candidates may be takenfrom additional spatial positions such as, e.g., A′ 2110 and B′ 2120surrounding the current CU 2100 as illustrated in FIG. 21 . Otherembodiments use yet further spatial positions along, or in proximity to,one of the edges of the current CU 2100.

FIG. 22 illustrates an exemplary algorithm 2200 that corresponds to theembodiment of using the additional spatial positions A′ 2110 and B′ 2120as shown in FIG. 21 and described above. For example, the algorithm 2200includes testing the new candidate position A′ if position A is not avalid Affine Merge prediction candidate (e.g., is not in a CU coded inAffine mode) at 2210 to 2230 of FIG. 22 . Likewise, for example, it alsotests the position B′ if the position B does not provide any validcandidate (e.g., is not in a CU coded in Affine mode) at 2240 to 2260 ofFIG. 22 . The remaining aspects of the exemplary process 2200 to build aset of Affine Merge candidates are essentially unchanged compared toFIG. 19 as shown and explained before.

In another exemplary embodiment, existing merge candidate positions areconsidered first before evaluating newly added positions. The addedpositions are evaluated only if the set of candidates contains lesscandidates than a maximum number of merge candidate, for example, 5 or7. The maximum number may be predetermined or be a variable. Thisexemplary embodiment is detailed by an exemplary algorithm 2300 of FIG.23 .

According to another exemplary embodiment, additional candidates, calledtemporal candidates, are added to the set of the predictor candidates.These temporal candidates may be used, for example, if no spatialcandidates have been found as described above or, in a variant, if thesize of the set of Affine Merge candidate has not reached a maximumvalue also as described before. Other embodiments use temporalcandidates before adding spatial candidates to the set. For example,temporal candidates to predict the control point motion vectors of acurrent CU may be retrieved from one or more of the reference pictureavailable or used for the current picture. The temporal candidates maybe taken, for example, at positions corresponding to the bottom-rightneighboring CU of the current CU in each of the reference pictures. Thiscorresponds to the candidate position F 2410 for a current CU 2400 beingencoded or decoded as shown in FIG. 24 .

In an embodiment, for example, for each reference picture of eachreference picture list, the affine flag associated with the block atposition F 2410 of FIG. 24 in the considered reference picture istested. If true, then the corresponding CU contained in that referencepicture is added to the current set of Affine Merge candidates.

In a further variant, temporal candidates are retrieved from thereference pictures at the spatial position corresponding to the top-leftcorner of the current CU 2400. This position corresponds to thecandidate position G 2420 of FIG. 24 .

In a further variant, temporal candidates are retrieved from thereference pictures at the position corresponding to the bottom-rightneighboring CU. Then, if the set of candidates contains less candidatesthan a prefixed maximum number of merge candidate, e.g., 5 or 7, thetemporal candidates corresponding to the top-left corner G 2420 of thecurrent CU are retrieved. In other embodiments, the temporal candidatesare obtained from a position, in one or more reference pictures,corresponding to a different portion (other than G 2420) of the currentCU 2400, or corresponding to another neighboring CU (other than F 2410)of the current CU 2400.

In addition, an exemplary derivation process for the control pointmotion vectors based on a temporal candidate proceeds as follows. Foreach temporal candidate contained in the constructed set, the block(tempCU) containing the temporal candidate in its reference picture isidentified. Then the three CPMVs and located at top-left, top-right andbottom-left corners of the identified temporal CU are scaled. Thisscaling takes into account the relationship between the POC (PictureOrder Count) of tempCU, the POC of the reference picture of tempCU(difference is denoted tempDist), the POC of the current CU, and the POCof the reference picture of the current CU (difference is denotedcurDist). For example, CPMVs can be scaled by the ratio between thedistances (tempDist/curDist). Once these three scaled CPMVs areobtained, the two Control Point Motion Vectors for the current CU arederived according to Equation 2 as described before.

Another exemplary embodiment includes adding a mean Control Point MotionVector pair computed as a function of the control point motion vectorsderived from each candidate. The exemplary process here is detailed byan exemplary algorithm 2500 shown in FIG. 25 . A loop 2510 is used foreach Affine Merge predictor candidate in the set constructed for theconsidered reference picture list.

Then at 2520, for each reference picture list Lx, successively equal toL0 and then L1 (if in a B slice), if the current candidate has validCPMVs for list Lx:

-   -   Initialize the pair of motion vectors ({right arrow over        (meanVLx₀)}, {right arrow over (meanVLx₁)}) to ({right arrow        over (0)}, {right arrow over (0)})    -   For each candidate        -   derive the CPMV ({right arrow over (ν₀)}, {right arrow over            (ν₁)}) from current candidate CPMVs according to Equation 2.        -   add ({right arrow over (ν₀)}, {right arrow over (ν₁)}) to            the pair ({right arrow over (meanVLx₀)}, {right arrow over            (meanVLx₁)});    -   divide the pair of motion vectors ({right arrow over        (meanVLx₀)}, {right arrow over (meanVLx₁)}) by the number of        candidates for list Lx;    -   assign the reference picture index to the motion vectors {right        arrow over (meanVLx₀)} and {right arrow over (meanVLx₁)}, equal        to the minimum reference picture index among all candidates of        each list respectively (The vector {right arrow over (meanVLx₀)}        refers to list0 and its associated reference index is set to the        minimum reference index observed among all the candidates in        list 0. The vector {right arrow over (meanVLx₁)} is the same,        except applied to list1.);    -   add the obtained mean pair of motion vectors ({right arrow over        (meanVLx₀)}, {right arrow over (meanVLx₁)}) to the set of        candidate CPMVs to generate the affine motion field of current        CU for list Lx.

Using the algorithm 2500 and/or other embodiments, the set of AffineMerge candidates is being further enriched, and contains the mean motioninformation computed from the CPMV derived for each candidate insertedinto the set of candidates, according to the previous embodiments asdescribed in the preceding sections.

Because it is possible that several candidates lead to the same CPMV forthe current CU, the above mean candidate may result in a weightedaverage pair of CPMV motion vectors. Indeed, the process described abovewould compute the average of the CPMVs collected so far, regardless oftheir uniqueness in the complete set of CPMVs. Therefore, a variant ofthis embodiment consists in adding again another candidate to the set ofCPMV prediction candidates. It consists in adding the average CPMV ofthe set of unique collected CPMVs (apart from the weighted mean CPMV asdescribed above). This provides a further candidate CPMV in the set ofthe predictor candidates to produce the Affine Motion field of thecurrent CU.

For example, consider the situation in which the following five spatialcandidates are all available and affine (L, T, TR, BL, TL). However, the3 left positions (L, BL, TL) are within the same neighboring CU. In eachspatial position, we can get candidate CPMVs. Then, the first mean isequal to the sum of these 5 CPMVs divided by 5 (even though several areidentical). In the second mean, only different CPMVs are considered, sothe 3 left ones (L, BL, TL) are only considered once and the second meanis equal to the 3 different CPMVs (L, T, TR) divided by 3. In the firstmean, redundant CPMVs are added 3 times which gives more weight to theredundant CPMVs. Using equations, we may write thatmean1=(L+T+TR+BL+TL)/5 with L=BL=TL so mean1=(3*L+T+TL)/5, whilemean2=(L+T+TL)/3.

The two previously described candidate means are bi-directional as soonas a considered candidate holds a motion vector for a reference image inlist 0 and another in list 1. In another variant, it is possible to adduni-directional means. From the weighted and the unique mean, fouruni-directional candidates may be constructed by picking motion vectorsfrom list 0 and list 1 independently.

One advantage of the exemplary candidate set extension methods describedin this application is an increase in the variety in the set ofcandidate Control Point Motion Vectors that may be used to construct theaffine motion field associated with a given CU. Thus, the presentembodiments provide technological advancement in the computingtechnology of video content encoding and decoding. For example, thepresent embodiments improve the rate distortion performance provided bythe Affine Merge coding mode in JEM. This way, the overall ratedistortion performance of the considered video codec has been improved.

A further exemplary embodiment may be provided to modify the process ofFIG. 18 . The embodiment includes a fast evaluation of the performanceof each CPMV candidate, through the following approximate distortion andrate computation. Accordingly, for each candidate in the set of CPMVs,the motion field of current CU is computed, and the 4×4 sub-block-basedtemporal prediction of the current CU is performed. Next the distortionis computed as the SATD between the predicted CU and the original CU.The rate cost is obtained as an approximated number of bits linked tothe signaling of the merge index of a considered candidate. A rough(approximate) RD cost is then obtained for each candidate. The finalselection is based on the approximate RD cost in one embodiment. Inanother embodiment, a subset of candidates undergo the full RD search,i.e. candidates that have the lowest approximate RD cost then undergothe full RD search. The advantage of these embodiments is that theylimit the encoder-side complexity increase that arises from the searchfor the best Affine Merge predictor candidate.

Also, according to another general aspect of at least one embodiment,the Affine Inter mode as described before may also be improved with allof the current teachings presented herewith by having an extended listof Affine predictor candidates. As described above in connection withFIG. 8A, one or more CPMVPs of an Affine Inter CU are derived fromneighboring motion vectors regardless of their coding mode. Therefore,it is then possible to take advantage of the affine neighbors with theiraffine model to construct the one or more CPMVPs of the current AffineInter CU, as in Affine Merge mode as described before. In that case, theconsidered affine candidates may be the same list as described above forAffine Merge mode (e.g., not limited to only spatial candidates).

Accordingly, a set of multiple predictor candidates are provided toimprove compression/decompression being provided by the current HEVC andJEM by using better predictor candidates. The process will be moreefficient and coding gain will be observed even if it may be needed totransmit a supplemental index.

According to a general aspect of at least one embodiment, the set ofAffine Merge candidates (with at least 7 candidates as in Merge mode) iscomposed of, e.g.:

-   -   Spatial candidates from (A, B, C, D, E),    -   Temporal candidates of the bottom-right co-located position if        less than 5 candidates in the list,    -   Temporal candidates of the co-located position if less than 5        candidates in the list,    -   Weighted mean    -   Unique mean    -   Uni-directional means from weighted mean if weighted mean is        bi-directional and if less than 7 candidates in the list,    -   Uni-directional means from unique mean if unique mean is        bi-directional and if less than 7 candidates in the list.

Also, in the AMVP case, predictor candidates may be taken from, e.g.:

-   -   Spatial candidates from set (A, B, C, D, E),    -   Supplemental spatial candidates from (A′, B′),    -   Temporal candidates of the bottom-right co-located position.

TABLE 1 and TABLE 2 below show the improvements over JEM 4.0 (parallel)using exemplary embodiments of some of the present proposed solutions.Each table shows the results of the amounts of rate reductions for oneof the exemplary embodiments as described above. In particular, TABLE 1shows the improvements when the 5 spatial candidates (A, B, C, D, E)shown in FIG. 8B are used as the set of the multiple predictorcandidates according to an exemplary embodiment described above. TABLE 2shows the improvements for an exemplary embodiment when the followingorder of the predictor candidates is used as described above: spatialcandidates first, then temporal candidates if the number of candidatesis still less than 5, then means, and finally uni-directional means ifthe number of candidates is still less than 7. Accordingly, for example,TABLE 2 shows that for this embodiment, the rate reductions for Y, U, Vsamples are respectively 0.22%, 0.26%, and 0.12% BD (Bjontegaard-Delta)rate reductions for Class D, with almost no increase in the encoding anddecoding running times (i.e., 100% and 101% respectively). Thus, thepresent exemplary embodiments improve the compression/decompressionefficiency while maintaining the computational complexity cost over theexisting JEM implementations.

TABLE 1 Results of RDO selection of the best CPMV using spatialcandidates over JEM 4.0 Random Access Main 10 Over JEM4.0 (parallel) Y UV EncT DecT Glass C −0.04% 0.09% −0.12% 100% 100% Class D −0.07% −0.20%−0.11% 100% 100%

TABLE 2 Results of RDO selection of the best CPMV using spatial,temporal, means and then uni-directional means candidates over JEM 4.0Random Access Main 10 Over JEM 4.0 (parallel) Y U V EncT DecT Class C−0.15% 0.02% −0.12% 100% 101% Class D −0.22% −0.26% −0.12% 100% 101%

FIG. 26 illustrates a block diagram of an exemplary system 2600 in whichvarious aspects of the exemplary embodiments may be implemented. Thesystem 2600 may be embodied as a device including the various componentsdescribed below and is configured to to perform the processes describedabove. Examples of such devices, include, but are not limited to,personal computers, laptop computers, smartphones, tablet computers,digital multimedia set top boxes, digital television receivers, personalvideo recording systems, connected home appliances, and servers. Thesystem 2600 may be communicatively coupled to other similar systems, andto a display via a communication channel as shown in FIG. 26 and asknown by those skilled in the art to implement all or part of theexemplary video systems described above.

Various embodiments of the system 2600 include at least one processor2610 configured to execute instructions loaded therein for implementingthe various processes as discussed above. The processor 2610 may includeembedded memory, input output interface, and various other circuitriesas known in the art. The system 2600 may also include at least onememory 2620 (e.g., a volatile memory device, a non-volatile memorydevice). The system 2600 may additionally include a storage device 2640,which may include non-volatile memory, including, but not limited to,EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash, magnetic disk drive, and/oroptical disk drive. The storage device 2640 may comprise an internalstorage device, an attached storage device, and/or a network accessiblestorage device, as non-limiting examples. The system 2600 may alsoinclude an encoder/decoder module 2630 configured to process data toprovide encoded video and/or decoded video, and the encoder/decodermodule 2630 may include its own processor and memory.

The encoder/decoder module 2630 represents the module(s) that may beincluded in a device to perform the encoding and/or decoding functions.As is known, such a device may include one or both of the encoding anddecoding modules. Additionally, the encoder/decoder module 2630 may beimplemented as a separate element of the system 2600 or may beincorporated within one or more processors 2610 as a combination ofhardware and software as known to those skilled in the art.

Program code to be loaded onto one or more processors 2610 to performthe various processes described hereinabove may be stored in the storagedevice 2640 and subsequently loaded onto the memory 2620 for executionby the processors 2610. In accordance with the exemplary embodiments,one or more of the processor(s) 2610, the memory 2620, the storagedevice 2640, and the encoder/decoder module 2630 may store one or moreof the various items during the performance of the processes discussedherein above, including, but not limited to the input video, the decodedvideo, the bitstream, equations, formulas, matrices, variables,operations, and operational logic.

The system 2600 may also include a communication interface 2650 thatenables communication with other devices via a communication channel2660. The communication interface 2650 may include, but is not limitedto a transceiver configured to transmit and receive data from thecommunication channel 2660. The communication interface 2650 mayinclude, but is not limited to, a modem or network card and thecommunication channel 2650 may be implemented within a wired and/orwireless medium. The various components of the system 2600 may beconnected or communicatively coupled together (not shown in FIG. 26 )using various suitable connections, including, but not limited tointernal buses, wires, and printed circuit boards.

The exemplary embodiments may be carried out by computer softwareimplemented by the processor 2610 or by hardware, or by a combination ofhardware and software. As a non-limiting example, the exemplaryembodiments may be implemented by one or more integrated circuits. Thememory 2620 may be of any type appropriate to the technical environmentand may be implemented using any appropriate data storage technology,such as optical memory devices, magnetic memory devices,semiconductor-based memory devices, fixed memory, and removable memory,as non-limiting examples. The processor 2610 may be of any typeappropriate to the technical environment, and may encompass one or moreof microprocessors, general purpose computers, special purposecomputers, and processors based on a multi-core architecture, asnon-limiting examples.

The implementations described herein may be implemented in, for example,a method or a process, an apparatus, a software program, a data stream,or a signal. Even if only discussed in the context of a single form ofimplementation (for example, discussed only as a method), theimplementation of features discussed may also be implemented in otherforms (for example, an apparatus or a program). An apparatus may beimplemented in, for example, appropriate hardware, software, andfirmware. The methods may be implemented in, for example, an apparatussuch as, for example, a processor, which refers to processing devices ingeneral, including, for example, a computer, a microprocessor, anintegrated circuit, or a programmable logic device. Processors alsoinclude communication devices, such as, for example, computers, cellphones, portable/personal digital assistants (“PDAs”), and other devicesthat facilitate communication of information between end-users.

Furthermore, one skilled in the art may readily appreciate that theexemplary HEVC encoder 100 shown in FIG. 1 and the exemplary HEVCdecoder shown in FIG. 3 may be modified according to the above teachingsof the present disclosure in order to implement the disclosedimprovements to the exiting HEVC standards for achieving bettercompression/decompression. For example, entropy coding 145, motioncompensation 170, and motion estimation 175 in the exemplary encoder 100of FIG. 1 , and entropy decoding 330, and motion compensation 375, inthe exemplary decoder of FIG. 3 may be modified according to thedisclosed teachings to implement one or more exemplary aspects of thepresent disclosure including providing an enhanced affine mergeprediction to the existing JEM.

Reference to “one embodiment” or “an embodiment” or “one implementation”or “an implementation”, as well as other variations thereof, mean that aparticular feature, structure, characteristic, and so forth described inconnection with the embodiment is included in at least one embodiment.Thus, the appearances of the phrase “in one embodiment” or “in anembodiment” or “in one implementation” or “in an implementation”, aswell any other variations, appearing in various places throughout thespecification are not necessarily all referring to the same embodiment.

Additionally, this application or its claims may refer to “determining”various pieces of information. Determining the information may includeone or more of, for example, estimating the information, calculating theinformation, predicting the information, or retrieving the informationfrom memory.

Further, this application or its claims may refer to “accessing” variouspieces of information. Accessing the information may include one or moreof, for example, receiving the information, retrieving the information(for example, from memory), storing the information, processing theinformation, transmitting the information, moving the information,copying the information, erasing the information, calculating theinformation, determining the information, predicting the information, orestimating the information.

Additionally, this application or its claims may refer to “receiving”various pieces of information. Receiving is, as with “accessing”,intended to be a broad term. Receiving the to information may includeone or more of, for example, accessing the information, or retrievingthe information (for example, from memory). Further, “receiving” istypically involved, in one way or another, during operations such as,for example, storing the information, processing the information,transmitting the information, moving the information, copying theinformation, erasing the information, calculating the information,determining the information, predicting the information, or estimatingthe information.

As will be evident to one of skill in the art, implementations mayproduce a variety of signals formatted to carry information that may be,for example, stored or transmitted. The information may include, forexample, instructions for performing a method, or data produced by oneof the described implementations. For example, a signal may be formattedto carry the bitstream of a described embodiment. Such a signal may beformatted, for example, as an electromagnetic wave (for example, using aradio frequency portion of spectrum) or as a baseband signal. Theformatting may include, for example, encoding a data stream andmodulating a carrier with the encoded data stream. The information thatthe signal carries may be, for example, analog or digital information.The signal may be transmitted over a variety of different wired orwireless links, as is known. The signal may be stored on aprocessor-readable medium.

1. A method for video encoding, comprising: accessing, for a block beingencoded in a picture, a set of predictor candidates having multiplepredictor candidates, wherein a predictor candidate corresponds to aspatial or a temporal neighboring block that has been encoded, whereinsaid block is encoded in an affine merge mode; selecting a predictorcandidate from the set of predictor candidates; obtaining, using aplurality of motion vectors associated with the selected predictorcandidate from the set of predictor candidates, a set of control pointmotion vectors for the block; obtaining, based on the set of controlpoint motion vectors, a motion field based on a motion model, whereinthe motion field identifies motion vectors used for prediction of allsub-blocks of the block being encoded; encoding the block based on themotion field; and encoding an index for the selected predictor candidatefrom the set of predictor candidates.
 2. The method of claim 1, whereinthe motion model is an affine model.
 3. The method of claim 1, whereinthe set of control point motion vectors for the block are storedseparately from the motion vectors of the motion field for the block. 4.The method of claim 3, wherein the stored motion vectors of the motionfield are for motion compensation of the block.
 5. The method of claim3, further comprising: accessing a second set of control point motionvectors for the selected predictor candidate, wherein the second set ofcontrol point motion vectors are stored separately from motion vectorsof all sub-blocks of the selected predictor candidate, wherein the setof control point motion vectors for the block are obtained responsive tothe second set of control point motion vectors for the selectedpredictor candidate, and wherein the motion vectors of all sub-blocks ofthe selected predictor candidate are for motion compensation of theselected predictor candidate.
 6. A method for video decoding,comprising: accessing, for a block being decoded in a picture, an indexcorresponding to a predictor candidate, wherein the predictor candidatecorresponds to a spatial or a temporal neighboring block that has beendecoded, and wherein said block is decoded in an affine merge mode;obtaining, using a plurality of motion vectors associated with thepredictor candidate, a set of control point motion vectors for the blockbeing decoded; obtaining, based on the set of control point motionvectors, a motion field based on a motion model, wherein the motionfield identifies motion vectors used for prediction of all sub-blocks ofthe block being decoded; and decoding the block based on the motionfield.
 7. The method of claim 6, wherein the motion model is an affinemodel.
 8. The method of claim 6, wherein the set of control point motionvectors for the block are stored separately from the motion vectors ofthe motion field for the block.
 9. The method of claim 8, wherein thestored motion vectors of the motion field are for motion compensation ofthe block.
 10. The method of claim 8, further comprising: accessing asecond set of control point motion vectors for the selected predictorcandidate, wherein the second set of control point motion vectors arestored separately from motion vectors of all sub-blocks of the selectedpredictor candidate, wherein the set of control point motion vectors forthe block are obtained responsive to the second set of control pointmotion vectors for the selected predictor candidate, and wherein themotion vectors of all sub-blocks of the selected predictor candidate arefor motion compensation of the selected predictor candidate.
 11. Anapparatus for video encoding, comprising: one or more processors,wherein said one or more processors are configured to: access, for ablock being encoded in a picture, a set of predictor candidates havingmultiple predictor candidates, wherein a predictor candidate correspondsto a spatial or a temporal neighboring block that has been encoded,wherein said block is encoded in an affine merge mode; select apredictor candidate from the set of predictor candidates; obtain, usinga plurality of motion vectors associated with the selected predictorcandidate from the set of predictor candidates, a set of control pointmotion vectors for the block; obtain, based on the set of control pointmotion vectors, a motion field based on a motion model, wherein themotion field identifies motion vectors used for prediction of allsub-blocks of the block being encoded; encode the block based on themotion field; and encode an index for the selected predictor candidatefrom the set of predictor candidates.
 12. The apparatus of claim 11,wherein the motion model is an affine model.
 13. The apparatus of claim11, wherein the set of control point motion vectors for the block arestored separately from the motion vectors of the motion field for theblock.
 14. The apparatus of claim 13, wherein the stored motion vectorsof the motion field are for motion compensation of the block.
 15. Theapparatus of claim 13, wherein the one or more processors are furtherconfigured to: access a second set of control point motion vectors forthe selected predictor candidate, wherein the second set of controlpoint motion vectors are stored separately from motion vectors of allsub-blocks of the selected predictor candidate, wherein the set ofcontrol point motion vectors for the block are obtained responsive tothe second set of control point motion vectors for the selectedpredictor candidate, and wherein the motion vectors of all sub-blocks ofthe selected predictor candidate are for motion compensation of theselected predictor candidate.
 16. An apparatus for video decoding,comprising: one or more processors, wherein the one or more processorsare configured to: access, for a block being decoded in a picture, anindex corresponding to a predictor candidate, wherein the predictorcandidate corresponds to a spatial or a temporal neighboring block thathas been decoded, and wherein said block is decoded in an affine mergemode; obtain, using a plurality of motion vectors associated with thepredictor candidate, a set of control point motion vectors for the blockbeing decoded; obtain, based on the set of control point motion vectors,a motion field based on a motion model, wherein the motion fieldidentifies motion vectors used for prediction of all sub-blocks of theblock being decoded; and decode the block based on the motion field. 17.The apparatus of claim 16, wherein the motion model is an affine model.18. The apparatus of claim 16, wherein the set of control point motionvectors for the block are stored separately from the motion vectors ofthe motion field for the block.
 19. The apparatus of claim 18, whereinthe stored motion vectors of the motion field are for motioncompensation of the block.
 20. The apparatus of claim 18, wherein theone or more processors are further configured to: access a second set ofcontrol point motion vectors for the selected predictor candidate,wherein the second set of control point motion vectors are storedseparately from motion vectors of all sub-blocks of the selectedpredictor candidate, wherein the set of control point motion vectors forthe block are obtained responsive to the second set of control pointmotion vectors for the selected predictor candidate, and wherein themotion vectors of all sub-blocks of the selected predictor candidate arefor motion compensation of the selected predictor candidate.